bayesian perspective
Generalization in the Face of Adaptivity: A Bayesian Perspective
Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution. It turns out that simple noise addition algorithms suffice to prevent this issue, and differential privacy-based analysis of these algorithms shows that they can handle an asymptotically optimal number of queries. However, differential privacy's worst-case nature entails scaling such noise to the range of the queries even for highly-concentrated queries, or introducing more complex algorithms.In this paper, we prove that straightforward noise-addition algorithms already provide variance-dependent guarantees that also extend to unbounded queries. This improvement stems from a novel characterization that illuminates the core problem of adaptive data analysis. We show that the harm of adaptivity results from the covariance between the new query and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries. We then leverage this characterization to introduce a new data-dependent stability notion that can bound this covariance.
A Bayesian Perspective on Training Speed and Model Selection
We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its marginal likelihood. Second, that this measure, under certain conditions, predicts the relative weighting of models in linear model combinations trained to minimize a regression loss. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent. Our results suggest a promising new direction towards explaining why neural networks trained with stochastic gradient descent are biased towards functions that generalize well.
Review for NeurIPS paper: A Bayesian Perspective on Training Speed and Model Selection
Weaknesses: At Eq. 5, the authors introduce two sampling based estimators of the lower bound (LB). I am not sure why the authors introduced both as an estimator for the LB: The second estimator is an unbiased estimator of the (log) marginal likelihood (ML). Though it could technically be considered a biased estimator of LB, I do not see why it should be introduced as such, since it is the unbiased estimator of the exact value the authors are hoping to approximate. Actually, in the following sentence the authors write that the second estimator's bias decreases as J is increased, which is very much expected, if not almost trivial considering the point above. Another point is that when J 1 the two estimators are algebraically the same, therefore the first one also becomes (a noisy) unbiased estimator of ML.
Review for NeurIPS paper: A Bayesian Perspective on Training Speed and Model Selection
The work considers SGD training for Bayesian linear models and illustrate a connection between training speed and generalization and why SGD tends to select simpler models. In particular, the work illustrates that a particular type of posterior sampling from gradient descent yields same model rankings as that based on the true posterior under suitable assumptions. Experiments on deep nets are also presented. The reviewers liked the work overall, but felt that some aspects of the exposition were unclear, the transition and implications for deep nets is not quite convincing especially since there is now better understanding of both optimization and generalization in deep nets, and baseline comparisons (e.g., sgld, L2 regularization, dropout, etc.) would strengthen the work.
Generalization in the Face of Adaptivity: A Bayesian Perspective
Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution. It turns out that simple noise addition algorithms suffice to prevent this issue, and differential privacy-based analysis of these algorithms shows that they can handle an asymptotically optimal number of queries. However, differential privacy's worst-case nature entails scaling such noise to the range of the queries even for highly-concentrated queries, or introducing more complex algorithms.In this paper, we prove that straightforward noise-addition algorithms already provide variance-dependent guarantees that also extend to unbounded queries. This improvement stems from a novel characterization that illuminates the core problem of adaptive data analysis. We show that the harm of adaptivity results from the covariance between the new query and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries. We then leverage this characterization to introduce a new data-dependent stability notion that can bound this covariance.
A Bayesian Perspective on Training Speed and Model Selection
We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its marginal likelihood. Second, that this measure, under certain conditions, predicts the relative weighting of models in linear model combinations trained to minimize a regression loss. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent.
End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
Rychener, Yves, Kuhn, Daniel, Sutter, Tobias
We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this analysis, we then propose new end-to-end learning algorithms for training decision maps that output solutions of empirical risk minimization and distributionally robust optimization problems, two dominant modeling paradigms in optimization under uncertainty. Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance.
- Energy > Oil & Gas > Upstream (0.46)
- Leisure & Entertainment > Games (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
[R] A Bayesian Perspective on Q-Learning
Sounds good, looks like I'll be making another exposition then! So in terms of making interactive documents like this, you have a few options. I'll list them in order of easiest to hardest (assuming you code in python and don't know much web dev): You will see that you can set up various toggles to run your visualizations. The one drawback is that it's not as interactive in "real time" because every time you reconfigure the parameters you have to re-run the cell to show the results. If you're interested in this approach just add a cell block, then click on the three dots, and then click "Add a form".